An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

نویسندگان

Amir Salarpour Computer Engineering, Bu-Ali Sina University

Hassan Khotanlou Computer, Bu-Ali Sina University

چکیده مقاله:

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations. In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.

Download for Free

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distance Measures for Effective Clustering of ARIMA Time-Series

Many environmental and socioeconomic time–series data can be adequately modeled using Auto-Regressive Integrated Moving Average (ARIMA) models. We call such time–series ARIMA time–series. We consider the problem of clustering ARIMA time–series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time–series for clustering ARIMA time–series, by using the Euclidean distance betwe...

متن کامل

An Empirical Evaluation of Similarity Measures for Time Series Classification

Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of this importance, countless approaches to estimate time series similarity have been proposed. However, there is a lack of comparative studies using...

متن کامل

Clustering and Visualization of Multivariate Time Series

The analysis of MTS is an established research area, and methods to carry it out have stemmed both from traditional statistics and from the Machine Learning and Computational Intelligence fields. In this chapter, we are mostly interested in the latter, but considering a mixed approach that can be ascribed to Statistical Machine Learning. MTS are often analyzed for prediction and forecasting and...

متن کامل

Clustering of Multivariate Time-Series Data

A new methodology for clustering multivariate time-series data is proposed. The methodology is based on calculation of the degree of similarity between multivariate time-series datasets using two similarity factors. One similarity factor is based on principal component analysis and the angles between the principal component subspaces while the other is based on the Mahalanobis distance between ...

متن کامل

An Experiment with Distance Measures for Clustering

Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study various distance measures and their effect on different clustering techniques. In addition to the standard Euclidean distance, we use Bit-Vector based, Comparative Clustering based, Huffman code based and Dominance based di...

متن کامل

Clustering time series under the Fréchet distance

The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give (1 + ε)-approximation algorithms for variations of the following problem with parameters k and `. Given n univariate time series P , each of complexity at most m, we find k time series, not necessarily from P , which we call cluster cen...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

عنوان ژورنال

International Journal of Engineering

دوره 31 شماره 2

صفحات 250- 262

تاریخ انتشار 2018-02-01

دنبال کردن

لغو دنبال کردن

{@ msg @}

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

Multivariate time series Similarity measures Clustering evaluation

میزبانی شده توسط پلتفرم ابری doprax.com